Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto
نویسندگان
چکیده
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting information from Web pages using such wrappers, and for translating the extracted content into XML. This paper describes some advanced features of Lixto, such as disjunctive pattern definitions, specialization rules, and Lixto’s capability of collecting and aggregating information from several linked Web pages.
منابع مشابه
Report on the Eighth International Workshop on Knowledge Representation Meets Databases ( KRDB ) , September 15 , 2001
The Eighth International Workshop on Knowledge Representation Meets Databases (KRDB) was held at the Ponti cia Universit a Urbaniana, in Rome, right after VLDB 2001. KRDB was initiated in 1994 to provide an opportunity for researchers and practitioners from the two areas to exchange ideas and results. This year's focus was on Modeling, Querying andManaging Semistructured Data. The one day progr...
متن کاملVisual Web Information Extraction with Lixto
We present new techniques for supervised wrapper generation and automated web information extraction, and a system called Lixto implementing these techniques. Our system can generate wrappers which translate relevant pieces of HTML pages into XML. Lixto, of which a working prototype has been implemented, assists the user to semi-automatically create wrapper programs by providing a fully visual ...
متن کاملDeclarative Web data extraction and annotation
We propose a software architecture for semantics-based annotation of data extracted from Web sources. Starting from the LiXto suite, which enables semi-automated extraction of XML data from regular documents, we present a solution for attaching background information to individual tags by means of so-called decorations. Decoration is carried out as an inferential activity in the formal context ...
متن کاملIntelligent Wrapping from PDF Documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. The semi-structured form of web pages, coupled with the availability of business-relevant data, has led to the availability of several established products on the market for wrapping data from the Web. One such approach is the Lixto me...
متن کاملWeb Information Acquisition with Lixto Suite: A Demonstration∗
We demonstrate the Lixto Suite, a web data extraction and transformation software kit for retrieving and converting information from various sources to various customer devices. With the Lixto Suite, non-technical content managers can rapidly develop applications in the areas of M-Commerce, E-Commerce, content integration and corporate portals.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001